Section 3 describes the development of explanatory variables (features) using Polymer class.
In addition to Polymetrics, the following libraries are imported:
numpy and pandas - for data analysis
plotly - for plotting heatmaps
import pandas as pd
import numpy as np
import Polymetrics as poly
import FileImport
import traceback
import plotly
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = "jupyterlab"
plotly.offline.init_notebook_mode()
The data used in the example is taken from a patent US2013/0046061 (Hermel-Davidock et al.). Features are developed for the inventive and the comparative samples.
The XLSXImport function prepares the imported polymer objects for further processing.
df_in = FileImport.XLSXImport("Article/Example_Dataset.xlsx", sheet_name = 'Data')
print(df_in.columns)
Index(['Identifier', 'Name', 'UID', 'Project', 'Classification', 'Type',
'Density', 'Tc', 'delHc', 'Tm', 'delHm', 'Mn', 'Mw', 'Mz', 'CDC',
'ZSVR', 'I2', 'I10', 'Unsat_1M_C', 'CEF_FileName',
'FilmFormulation_FileName', 'MFR_180C', 'CEF_Data'],
dtype='object')
Explanatory variables are developed for every polymer object by looping through the polymer objects sequentially. The additional features developed from the experimental data can be combined with the other explantory variables to make a features array
df_pat = df_in[(df_in['Project'] == 'US20130046061') & (df_in['Type'] == 'Resin_Developmental')]
Features = pd.DataFrame(columns = df_pat.index) # Originl index related information is retained.
#I believe pandas is more forgiving towards the dynamic allocation of rows than that of columns. The columns are preassigned using
#Features = pd.DataFrame(index =df_pat.index) #ValueError('Must have equal len keys and value when setting with an iterable')
#Features = pd.DataFrame() #ValueError('Must have equal len keys and value when setting with an iterable')
for i, polymer_ in zip(df_pat.index, df_pat.to_dict(orient="records")):
PE = poly.Polymer(polymer_)
try:
# Feature Building -
BasicStats_dict = PE.BasicStats(Interpolate = True, minT = 30.0 , maxT = 110.0)
# features generating function return signature is dictionaries
Features_dict = {**BasicStats_dict} #Creates a union of N dictionaries using unpacking method.
Features.loc[:,i]=pd.Series(Features_dict)
except Exception as error:
print('Polymetrics Error at', polymer_['Identifier'], repr(error))
#traceback.print_exc()
pass
Features = Features.T #Transpose of the Features DataFrame for row-wise arrangement of polymer objects.
# This step can potentially change the datatype of elements from float to object.
Features = Features.astype('float64')
#print(Features.dtypes)
Polymetrics Error at CS4 AttributeError("'Polymer' object has no attribute 'df_CEF'")
J = df_pat.select_dtypes(np.number)
Data = pd.concat([J, Features, df_pat['Classification']], axis=1)
Data.dropna(inplace = True, how = 'any', axis = 0)
FeaturesA = Data.drop('Classification', axis = 1)
fig = px.imshow(FeaturesA.corr(method = 'spearman'),
color_continuous_scale=px.colors.diverging.BrBG,
color_continuous_midpoint=0,
title = 'Correlation Matrix')
fig.show()
XX = poly.drop_correlated(FeaturesA, coeff = 0.8, Retain = ['COV', 'delHm'], Drop = ['Density', 'Tc', 'delHc', 'Tm'], Plot = False)
fig = px.imshow(XX.corr(method = 'spearman'),
color_continuous_scale=px.colors.diverging.BrBG,
color_continuous_midpoint=0,
title = 'Correlation Matrix')
fig.show()
Correlated variables in remaining data to drop ['Mz', 'ZSVR', 'I2', 'I10', 'Mean', 'STDEV', 'Median', 'MAD', 'MedianAD', 'AUC'] Variables correlated with the variables to retain ['CDC', 'IQR', 'Mn']